Chinese Word Sense Induction with Basic Clustering Algorithms

نویسندگان

  • Yuxiang Jia
  • Shiwen Yu
  • Zhengyan Chen
چکیده

Word Sense Induction (WSI) is an important topic in natural langage processing area. For the bakeoff task Chinese Word Sense Induction (CWSI), this paper proposes two systems using basic clustering algorithms, k-means and agglomerative clustering. Experimental results show that k-means achieves a better performance. Based only on the data provided by the task organizers, the two systems get FScores of 0.7812 and 0.7651 respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Soochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010

Recent studies on word sense induction (WSI) mainly concentrate on European languages, Chinese word sense induction is becoming popular as it presents a new challenge to WSI. In this paper, we propose a feature-based approach using the spectral clustering algorithm to this problem. We also compare various clustering algorithms and similarity metrics. Experimental results show that our system ac...

متن کامل

LSTC System for Chinese Word Sense Induction

This paper presents the Chinese word sense Induction system of Leshan Teachers’ College. The system participates in the Chinese word sense Induction of task 4 in Back offs organized by the Chinese Information Processing Society of China (CIPS) and SIGHAN. The system extracts neighbor words and their POSs centered in the target words and selected the best one of four cluster algorithms: Simple K...

متن کامل

Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarseg...

متن کامل

Word Sense Induction using Cluster Ensemble

In this paper, we describe the implementation of an unsupervised learning method for Chinese word sense induction in CIPS-SIGHAN-2010 bakeoff. We present three individual clustering algorithms and the ensemble of them, and discuss in particular different approaches to represent text and select features. Our main system based on cluster ensemble achieves 79.33% in F-score, the best result of thi...

متن کامل

Chinese Word Sense Induction based on Hierarchical Clustering Algorithm

Sense induction seeks to automatically identify word senses of polysemous words encountered in a corpus. Unsupervised word sense induction can be viewed as a clustering problem. In this paper, we used the Hierarchical Clustering Algorithm as the classifier for word sense induction. Experiments show the system can achieve 72% F-score about train-corpus and 65% F-score about test-corpus.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010